Fast thread communication and synchronization mechanisms for a scalable single chip multiprocessor
نویسنده
چکیده
Much of the improvement in computer performance over the last twenty years has come from faster transistors and architectural advances that increase parallelism. Smaller feature sizes have decreased the transistor switching time but at the same time increased the resistance of interconnect wires, resulting in slower signal transmission in on-chip wiring. Since future chips will have more silicon area and include more execution units, a much larger demand for parallelism is emerging. However, the increased signi cance of wire delay will require monolithic components, such as processors and caches, to be small and that the communication wires connecting them be short. Computer systems typically exploit concurrency using either instruction level parallelism (ILP) or coarse-grain parallel threads running on a multiprocessor. This thesis proposes mechanisms for exploiting on-chip parallelism at a ne grain to bridge the gap between ILP and coarse-grain multiprocessing. Fast interprocessor communication and synchronization enables the use of tasks with run lengths as small as 10 cycles. At the same time, these interaction mechanisms are less susceptible than conventional microprocessor designs to longer wire delays imminent in future silicon process technologies. As ne-grain parallelism is orthogonal to ILP and coarse-grain threads, it complements both methods and provides an opportunity for greater speedup. This thesis presents the architecture and implementation of the MIT Multi-ALU Processor (MAP), a 5 million transistor custom VLSI microprocessor chip. The MAP architecture incorporates 9 function units, split into 3 independent processors. The processors communicate via interprocessor register writes and synchronize using a hardware barrier instruction. These integrated mechanisms allow threads to communicate 10 times faster and synchronize 60 times faster than using a shared on-chip cache. The fast interprocessor interaction enables the MAP to exploit both instruction-level parallelism and ne-grain thread level parallelism. On a suite of applications, speedups of 1.2{2.4 are achieved using ne-grain threads on a 3-processor MAP chip. Thesis Supervisor: Dr. William J. Dally Title: Professor of Electrical Engineering and Computer Science
منابع مشابه
An On-Chip Multiprocessor Architecture with a Non-Blocking Synchronization Mechanism
tive to superscalar architectures [5][8][12][13]. Strengths of an on-chip MP architecture are threefold. First, an MP can exploit different level parallelism, thread-level parallelism (TLP), in addition to ILP. Second, the complexity can be suppressed using simple processors. This ensures a high clock rate. Third, communication latency can be significantly reduced using an on-chip network. Thes...
متن کاملSoftware and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
Thread-level speculation (TLS) makes it possible to parallelize general purpose C programs. This paper proposes software and hardware mechanisms that support speculative thread-level execution on a single-chip multiprocessor. A detailed analysis of programs using the TLS execution model shows a bound on the performance of a TLS machine that is promising. In particular, TLS makes it feasible to ...
متن کاملA Flexible, Efficient Concurrent Garbage Collector for Speculative Thread Processors
Michael Chen and Kunle Olukotun Computer Systems Lab, Stanford University Abstract In this paper, we introduce a novel garbage collector for Java to be used for processors with speculative threads support like the Hydra chip multiprocessor (CMP). Thread speculation permits parallel execution of sections of sequential code with data dependencies enforced in the hardware, eliminating the need for...
متن کاملCompiler Optimization of Value Communication for Thread-Level Speculation
In the context of Thread-Level Speculation (TLS), inter-thread value communication is the key to efficient parallel execution. From the compiler’s perspective, TLS supports two forms of inter-thread value communication: speculation and synchronization. Speculation allows for maximum parallel overlap when it succeeds, but becomes costly when it fails. Synchronization, on the other hand, introduc...
متن کاملA design study of the EARTH multiprocessor
Multithreaded node architectures have been proposed for future multiprocessor systems. However, some open issues remain: can eecient multithreading support be provided in a multiprocessor machine such that it is capable of tolerating synchronization and communication latencies, with little intrusion on the performance of sequentially-executed code? Also, how much (quantitatively) does such non-...
متن کامل